In my previous post How Replay MCP Helped Find a React Bug Faster Than Dan Abramov Did, I showed how agents were able to successfully identify and fix a difficult React internals bug using just Replay recordings of the reproduction, and how different prompts and instructions resulted in faster analysis results.

However, most of us aren’t actually working on React’s internals :) We’re using React in apps , and dealing with app-level concerns: page loading times, network requests, good UX, and data correctness. What happens when you use AI to debug those kinds of problems? Can Replay MCP make agents better at finding real-world app bugs?

Background: Nadia’s “Debugging with AI” Example

Nadia Makarevich is author of the book Advanced React, as well as the excellent blog Developer Way, where she’s published numerous posts on React’s behavior and usage. She recently wrote a post titled “Debugging with AI: Can It Replace an Experienced Developer?”, where she tested LLM debugging abilities against an example Next.js app that had several representative bugs inserted to recreate problems she’d seen in real apps. She talked through her instructions to the LLM, how well it did solving each bug, and described where it came up with insufficient answers or just failed to solve the problem at all.

Nadia used Opus 4.5 to do realistic “blind” investigations on three different app bugs: a Zod schema error, a double loading skeleton, and a Next.js redirect error. She graded the LLM’s results in three ways:

Did the LLM fix the problem?

Did the LLM identify the root cause of the problem correctly?

Did the LLM fix the root cause of the problem correctly?

Her results and final analysis indicated that while the LLM had some success, it struggled to understand the deeper architectural issues and either failed to identify the root cause or gave misleading answers:

	User Page is Broken	Double skeletons problem	Weird Redirect Error
Did LLM fix the problem?	✅	✅	❌
Did LLM identify the root cause of the problem correctly?	✅	❌	❌
Did LLM fix the root cause of the problem correctly?	✅	❌	❌

As she concluded:

💡

So, what's the verdict? Can AI replace an experienced developer for debugging?

No, of course not. AI is very good at pattern recognition, and it can be very, very useful. It will do exceptionally well on standard stuff: schema validation mishaps, forgotten null checks, common runtime errors, you've seen it for yourself already

But even if the fix works, I always retrace it step by step and make sure it's the actual root cause. If the fix doesn't work, however, I almost never iterate with the LLM, other than asking it to find me something in the repo. As you've seen, half of its responses in this case will be very confident hallucinations.

When the problem requires actual understanding of why the system behaves the way it does, or how it should behave, especially from a future or user perspective, the AI fails.

And the skill here isn't knowing how to prompt better. It's knowing when to stop prompting and start thinking.

I took this as a challenge :) I wanted to see if adding Replay recordings and Replay MCP to the mix would help an agent solve these problems any better. I also wanted to see if improving prompts and providing more context could produce better results.

Agentic Time-Travel Debugging with Replay MCP

Let’s step back and recap what Replay MCP actually is.

Replay is a time-travel debugger for web apps. The Replay recording browser captures DVR-style recordings of an entire web app. Because we capture the entire browser’s behavior and inputs, we can replay the app’s execution exactly the way it ran during the original recording. Unlike session replay or prod monitoring tools, Replay lets you do time-travel debugging and inspect the app’s behavior at any point in time: see which lines of code executed, evaluate log statements for every time a line ran, view the DOM tree and React component tree at any point in time, and much more. This makes it possible to investigate and solve bugs in ways no other tool can.

Replay DevTools is our debugging UI for humans - browser devtools with time-travel built in. Replay MCP gives agents those same time-travel debugging capabilities. Agents can open a recording and use the MCP tools to investigate the same way a human would: looking at console messages, adding logpoints to evaluate expressions each time a line of code ran, getting screenshots and stack traces, and getting framework-specific insights into libraries like React, Redux, Zustand, and TanStack Query. This means agents can now do the investigation work for you automatically!

Investigation Process and Setup

For this experiment, I used my own personal agent setup: OpenCode 1.4 and Opus 4.6. I have some file search and context management plugins enabled, but otherwise no specific skills or custom behaviors.

I wanted to run this as a proper experiment: how much do prompts and investigation instructions matter, and how much does Replay MCP help? Bug #1 was pretty trivial, but for Bug #2 and Bug #3, I kicked off four parallel agent investigation sessions. Each agent session was given the same access to the Replay recordings of the bug and a local copy of the demo app source, but with varying prompt instructions:

Light details, bug repro, only told to “write a bug report with a root cause and suggested fix”

Same repro steps, but a detailed 8-step investigation methodology including explaining why the problem is happening

Additional context on the app’s architecture and patterns

Additional relevant framework skills or concepts

How would they do? What differences would we see between them in results or investigation times?

Bug #1: Zod Schema Error in a User Profile

Bug #1 was pretty trivial. Navigate to /users/1 , and there’s a Zod schema error shown in both a UI error overlay and in the console messages. Nadia’s initial prompt was:

💡

When I navigate to the /users/1 route, the page shows a "Something went wrong" error. In the browser console, I see the following errors [copy-paste the entire error stack]. Server logs are clean. Can you fix it?

Her agent found this error immediately and correctly identified the fix: making the user address fields optional, and handling the type changes when rendering user.address .

My agent was given somewhat more detailed instructions on investigating and using the MCP, but this one was a gimme anyway. It called the RecordingOverview MCP tool, immediately saw the Zod error, used Screenshot to confirm the UI overlay, then used the NetworkRequest tool to check the actual API response and confirmed the address and phone fields were missing despite the schema marking them as required. It nailed the issue and cause in about 2 minutes.

Bug #1: Agent prompt

Bug #1: Agent final analysis

As expected — API routes run on the server and aren't in the browser recording. But we already have the response body, which is the definitive evidence.

Here's the complete bug analysis:

Bug Report

1. Bug Description

When the user navigates to the User Profile page (by clicking around ~7s into the recording), the page crashes and shows a Next.js error boundary instead of the user's profile. The user sees an error screen rather than their profile information.

2. Root Cause

Schema-API mismatch. The UserSchema Zod validator in src/lib/schemas.ts:238-255 requires fields that the /api/user endpoint doesn't return.

The schema expects (among other fields):

phone: z.string() (line 242) — required string

address: z.object({...}) (line 246-252) — required object with street, city, state, postalCode, country

The API returns:

json
{
  "id": 1,
  "name": "John Doe",
  "email": "john.doe@example.com",
  "department": "Engineering",
  "avatarUrl": null,
  "role": "admin",
  "joinedAt": "2023-03-15T09:00:00Z",
  "lastLoginAt": "2024-01-28T14:32:00Z"
}

No phone field. No address field.

The crash chain:

User navigates to the profile page

UserPage component (line 56) calls useUser() → TanStack Query calls fetchUser()

fetchUser() in src/queries/useUser.ts:11 calls UserSchema.parse(data) on the API response

Zod throws ZodError with two issues: phone expected string got undefined, address expected object got undefined

TanStack Query surfaces this as error in the query result

UserPage (line 62-63) checks if (error) and re-throws it: throw error

Next.js ErrorBoundaryHandler catches the thrown error and renders the error screen

Lines 66+ (actual profile rendering) execute 0 times — never reached

3. Suggested Fix

Option A: Fix the API — Add the missing phone and address fields to the /api/user response:

typescript
// In the API route handler for /api/user
return Response.json({
  id: 1,
  name: "John Doe",
  email: "john.doe@example.com",
  phone: "+1-555-0123",            // ADD
  department: "Engineering",
  avatarUrl: null,
  role: "admin",
  address: {                        // ADD
    street: "123 Main St",
    city: "San Francisco",
    state: "CA",
    postalCode: "94105",
    country: "US",
  },
  joinedAt: "2023-03-15T09:00:00Z",
  lastLoginAt: "2024-01-28T14:32:00Z",
});

Option B: Fix the schema — Make phone and address optional in UserSchema so it tolerates the current API response:

typescript
// src/lib/schemas.ts
export const UserSchema = z.object({
  id: z.number(),
  name: z.string(),
  email: z.string().email(),
  phone: z.string().optional(),              // was: z.string()
  avatarUrl: z.string().url().nullable(),
  role: z.enum(['admin', 'manager', 'user']),
  department: z.string().nullable(),
  address: z.object({                        // was: z.object({...})
    street: z.string(),
    city: z.string(),
    state: z.string(),
    postalCode: z.string(),
    country: z.string(),
  }).optional(),                             // ADD .optional()
  joinedAt: z.string(),
  lastLoginAt: z.string().nullable(),
});

Option A is better if the data should exist. Option B is better if the data is genuinely optional.

4. Key Evidence Trail

Step	Tool	Finding
1	`RecordingOverview`	ZodError for `phone` and `address` fields in console errors at ~8.2s
2	`ConsoleMessages` (errors)	Error originates in `fetchUser` at `useUser.ts:15`, caught by `<UserPage>` error boundary
3	`UserInteractions`	User clicks at ~7s (navigating to profile page)
4	`NetworkRequest` (detail for `/api/user`)	Response body has no `phone` or `address` fields, returns 200 OK
5	`ReadSource` (schemas.ts:238)	`UserSchema` requires `phone: z.string()` and `address: z.object({...})` — both non-optional
6	`ReadSource` (useUser.ts)	`fetchUser()` calls `UserSchema.parse(data)` which throws on the missing fields
7	`ReadSource` (UserPage.tsx:55)	Component re-throws the query error at line 63; lines 66+ (profile rendering) have 0 execution hits — never reached
8	`Screenshot` at 15s	Confirms user sees an error boundary crash screen, not the profile

Bug #2: User Page Double Loading Skeleton

After fixing the first issue, Nadia described the second bug in the app: throttle the network connection, navigate from the root page to a user profile, and see two different loading skeleton flash on screen before the final content is displayed. But, loading the user profile directly only shows one skeleton.

Nadia’s prompt was:

💡

When I'm on the root page and navigate to the User page, I see two different loading skeletons appear. However, when I just refresh the User page, I see only one. Why is that and how to fix it?

She reported that her agent came up with a variety of answers, several of which involved switching from TanStack Query’s useQuery hook to useSuspenseQuery . That answer is half right, but switching to useSuspenseQuery uncovered a second problem.

The double skeletons had to do with mixing Next’s loading.tsx page Suspense boundary, with an app-level skeleton component based on the TSQ isLoading flag. However, just switching to useSuspenseQuery introduces a hydration mismatch error down the line. The real fix is architectural - actually prefetching data on the server, and hydrating the query hook on the client side with the same data.

Given that, I wanted to see: would an agent with Replay MCP actually find the real root cause and suggest server prefetching as the full solution?

I kicked off three agent sessions: Nadia’s baseline prompt, additional investigative method instructions, and method + some app architecture context. The results were partially an improvement on Nadia’s agent - here’s how they did.

Bug #2 Agent #1: Basic Prompt

Agent #1 started by inspecting the Replay recording to understand the overall behavior. It called RecordingOverview to get the initial details, UserInteractions to get a list of clicks, and and made numerous Screenshot to get visuals on the navigation. It then used ListSources and SearchSources to identify the Users page routes and checked for Suspense handling.

A ReadSource call for UserPage.tsx returned hit counts showing it had rendered twice, once with isLoading: true. The agent then made multiple ReactRenders tools calls to check the actual render sequence and triggers. The render details showed a mixture of Suspense and loading-state based causes, confirming the mismatch. Finally, it was able to read the local repo files on disk to confirm the available hooks.

Overall, this agent did a reasonable actual investigation, came up with real evidence, wrapped up the search and reported. The answer was directionally correct, just not complete enough.

Total time: 5 minutes

Bug #2 Agent #1 Prompt

Bug #2 Agent #1 Solution

💡

Bug Report: Double Loading Skeleton on Client-Side Navigation to User Page

1. Bug Description

When navigating from the root page (/) to the User page (/users/[id]) via client-side navigation, the user sees two different loading skeletons in sequence:

First, a simpler skeleton from loading.tsx (avatar circle + name bar + content area)

Then, a more detailed skeleton from UserPageSkeleton inside the component (avatar + name + badge + separator + grid)

On a direct page refresh of the User page URL, only one skeleton appears. The visual "flash" between two different skeleton designs during navigation is jarring.

2. Root Cause

There are two independent loading mechanisms that fire sequentially during client-side navigation:

Mechanism 1 — Next.js loading.tsx Suspense boundary (src/app/users/[id]/loading.tsx)

The page component at src/app/users/[id]/page.tsx:5-10 is an async server component:

typescript
export default async function Page({ params }: { params: Promise<{ id: string }> }) {
  const { id } = await params;  // <-- this `await` causes suspension
  return <UserPage userId={id} />;
}

During client-side navigation, Next.js needs to fetch the server component payload. The await params causes the route segment to suspend, which triggers the loading.tsx Suspense fallback. This renders Skeleton #1 (commit 135 at 8.89s — RenderSuspended trigger, 145 DOM mutations).

Mechanism 2 — Component-level isLoading guard (src/components/pages/UserPage.tsx:57-59)

typescript
const { data: user, isLoading, error } = useUser();  // useQuery, NOT useSuspenseQuery

if (isLoading) {
  return <UserPageSkeleton />;  // <-- Skeleton #2
}

When the Suspense boundary resolves (server component payload arrives), UserPage mounts. The useQuery hook starts fetching /api/user and immediately returns isLoading: true, rendering Skeleton #2 (commit 136 at 9.02s — SuspenseResumed trigger).

Why refresh works fine: On a full page refresh, the async server component resolves server-side before sending HTML. The client only sees the useQuery loading state — there's no sequential Suspense → component loading transition.

3. Suggested Fix

The cleanest fix is to switch from useQuery to useSuspenseQuery so that the data fetching integrates with the existing Suspense boundary. The useUserSuspense hook already exists in src/queries/useUser.ts:22-27 but is unused.

In src/components/pages/UserPage.tsx:

diff
- import { useUser } from '@/queries/useUser';
+ import { useUserSuspense } from '@/queries/useUser';

  export function UserPage({ userId }: UserPageProps) {
-   const { data: user, isLoading, error } = useUser();
-
-   if (isLoading) {
-     return <UserPageSkeleton />;
-   }
-
-   if (error) {
-     throw error;
-   }
+   const { data: user } = useUserSuspense();

With useSuspenseQuery:

The component suspends while data is loading, keeping the loading.tsx skeleton visible

When data arrives, the component renders with the actual user data — no intermediate skeleton

Error handling is automatic (suspense queries throw to the nearest error boundary)

The UserPageSkeleton function and isLoading branch can be removed entirely (or kept for other uses)

Both navigation and refresh paths show a single, consistent loading skeleton from loading.tsx

Alternative fix (if you want to keep useQuery): Delete src/app/users/[id]/loading.tsx so only the component-level skeleton appears. But this is worse because you'd lose the instant loading feedback during navigation — there'd be a blank delay before the component even mounts.

4. Key Evidence Trail

Step	Evidence	Tool/Source
Identified two skeletons	React commits 135 + 136 in the recording	`ReactRenders` commits mode, time range 8.5-10s
Commit 135: `RenderSuspended` trigger	`loading.tsx` fallback shown with 145 DOM mutations at 8.89s	`ReactRenders` commit detail
Commit 136: `SuspenseResumed` trigger	`UserPage` mounts, renders `UserPageSkeleton` (8 Skeleton children) at 9.02s	`ReactRenders` commit detail
Commit 137: Data arrives	`UserPage` re-renders with actual user data at 9.72s	`ReactRenders` commit detail
`useQuery` vs `useSuspenseQuery`	`useUser()` uses `useQuery` — doesn't integrate with Suspense	Source: `src/queries/useUser.ts`
Unused suspense hook exists	`useUserSuspense()` already defined but not imported	Source: `src/queries/useUser.ts:22-27`
`page.tsx` is async	`await params` causes server component to suspend during navigation	Source: `src/app/users/[id]/page.tsx`

Bug #2 Agent #2: Investigative Process

Agent #2 was given a more detailed 8-step investigative process to follow. How much of a difference would that make?

Agent #2 started with the same initial tool calls: RecordingOverview to orient itself, UserInteractions and Screenshot to understand the usage. It also used NetworkRequest to check the timestamps of calls to /api/user and /api/current-user.

After calling ListSources to check the files in the recording, it opted to read them all from the local filesystem.

From there, it used ReactRenders to focus in on the critical Suspense-related renders and causes, but also started reading the various Next routes to see how they handled loading patterns. It identified that most routes did have an isLoading check, but only the users/[id] route had a loading.tsx file included, explaining why the bug is specific to the User page.

This agent came up with much better explanations and relevant details in its report. It identified a waterfall request happening, and found the loading skeleton mismatch. Its primary suggestion was the same incomplete useSuspenseQuery fix, but it did explicitly say “Ideal optimization would be server-side prefetching to eliminate the waterfall”.

Total time: 9 minutes

Bug #2 Agent #2 Prompt

💡

Task: Debug a Bug in a Replay Recording

Recording ID: c832545f-a41b-4cd0-9e59-e229b7ca0b7f

You have access to a Replay.io recording of a web application that has a bug.

Bug report from user:

When I'm on the root page and navigate to the User page, I see two different loading skeletons appear. However, when I just refresh the User page, I see only one. Why is that and how to fix it?

Your Mission

Investigate the recording to find the root cause and suggest a complete fix.

Tools Available

Replay MCP tools for analyzing the recording (use replay_ prefixed tools)

The application source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't accessible in the recording

Investigation Process

Follow this structured approach:

Establish the user experience. Use screenshots and interaction timeline to understand exactly what happens visually in both scenarios (navigation vs refresh). Pin down timestamps.

Map the data flow. Identify what data the page needs, where it comes from, when fetches start relative to navigation, and what components consume the results. Use network requests + source code to build this picture.

Build the render timeline. Use React render analysis to understand what components mount, unmount, and re-render — in what order, triggered by what. Compare the component render sequence between the two scenarios if possible.

Compare the two scenarios explicitly. Navigation and refresh produce different results. What's concretely different in the execution path? Don't just explain one path — explain why they diverge.

Find the proximate cause, then go deeper. After identifying the immediate code-level issue, ask yourself: "Does my proposed fix fully resolve the problem, or does it just reduce the symptom?" Consider:

Does the fix eliminate unnecessary work, or just hide it?

Is there a sequential dependency (waterfall) that shouldn't exist?

Would the fix match how other similar pages in the app work?

Check for patterns. Look at how other pages/routes in the app handle data fetching and loading states. If there's a pattern that works correctly elsewhere, the fix should align with that pattern.

Consider framework idioms. Is the code using the framework's intended data flow patterns? If the framework provides a mechanism for this exact scenario, is it being used?

Red-team your conclusion. Before presenting your fix, challenge it: What would still be suboptimal after applying it? Is there a remaining waterfall or unnecessary loading state? Would an expert reviewing this fix say "yes, but you should also..."?

Deliverable

Write up your findings:

Bug description — what the user experiences in each scenario

Root cause analysis — the specific code/logic/architectural error, with evidence from the recording

Suggested fix — concrete code change(s). If there are multiple layers to the fix (immediate + architectural), describe both.

Key evidence trail — the chain of observations that led to your diagnosis

Remaining concerns — anything that your fix doesn't address, or that you're uncertain about

Bug #2 Agent #2 Solution

💡

Bug Report: Double Loading Skeleton on User Page Navigation

1. Bug Description

What the user sees:

Navigation (root → User page): Two visually distinct loading skeletons appear in sequence:

A small card skeleton (12px avatar, narrow name placeholder) for ~130ms

A larger, more detailed skeleton (24px avatar, separator, 2-column grid) for ~700ms

Then the actual user profile renders

Page refresh on User page: Only one loading skeleton appears before the data renders.

The two skeletons have different layouts, sizes, and levels of detail, making the transition jarring.

2. Root Cause Analysis

The bug is a conflict between two independent loading mechanisms that both activate during client-side navigation but not during SSR/refresh.

The two loading sources:

#	Source	File	Trigger
1	Next.js Suspense boundary	`src/app/users/[id]/loading.tsx`	Route navigation triggers Suspense while RSC payload + JS chunks load
2	Client-side `isLoading` check	`src/components/pages/UserPage.tsx:63-65`	`useUser()` (non-suspense `useQuery`) starts fetching `/api/user` after mount

Why navigation shows two skeletons:

User clicks "User" link → Next.js starts client-side route transition

Commit 135 (8.89s): Next.js hits the Suspense boundary at users/[id]/loading.tsx, renders Skeleton #1 while loading the RSC payload and JS chunks

Commit 136 (9.02s): RSC resolves, UserPage component mounts. But useUser() uses useQuery (not suspense-aware), so it returns isLoading: true immediately and renders Skeleton #2 (UserPageSkeleton)

Commit 137 (9.72s): /api/user response arrives (fetched at 9172ms, took 548ms), UserPage re-renders with actual data

Why refresh shows one skeleton:

On a full page load, Next.js SSR renders the server component synchronously. The loading.tsx Suspense boundary either doesn't trigger or resolves instantly during SSR. The client hydrates with UserPage already mounted, and only the client-side useQuery loading state is visible.

The architectural error: UserPage uses useQuery (non-suspense) despite having a Next.js loading.tsx Suspense boundary in the route. This creates two competing loading state mechanisms. The codebase already has useUserSuspense() (using useSuspenseQuery) in src/queries/useUser.ts — it's just not being used.

3. Suggested Fix

Primary fix: Switch UserPage to use useSuspenseQuery and remove the manual loading state handling. This lets the existing loading.tsx Suspense boundary handle all loading UI.

src/components/pages/UserPage.tsx:

diff
- import { useUser } from '@/queries/useUser';
+ import { useUserSuspense } from '@/queries/useUser';

  export function UserPage({ userId }: UserPageProps) {
-   const { data: user, isLoading, error } = useUser();
-
-   if (isLoading) {
-     return <UserPageSkeleton />;
-   }
-
-   if (error) {
-     throw error;
-   }
+   const { data: user } = useUserSuspense();

    if (!user) {

The UserPageSkeleton function (lines 27-55) can also be removed from this file since it's no longer used — the loading.tsx skeleton handles that role.

Additional consideration: The loading.tsx skeleton is currently simpler/smaller than UserPageSkeleton. You may want to update src/app/users/[id]/loading.tsx to match the richer skeleton layout (24px avatar, separator, 2-column grid) so the loading state better matches the final rendered page.

Why this is the correct approach:

Other query files (useOrder.ts, useProduct.ts) also export both useX() and useXSuspense() variants — but those pages don't have loading.tsx files, so the non-suspense version works fine

The users/[id]/loading.tsx file exists specifically to be the loading UI for this route — it should be the single source of loading state

useSuspenseQuery integrates with React Suspense natively, which is exactly how Next.js loading.tsx works (it wraps the page in a <Suspense> boundary)

4. Key Evidence Trail

Screenshots at 8742ms vs 9187ms showed two visually different skeletons appearing in sequence

React render commits 135→136→137 showed the exact sequence: RenderSuspended (loading.tsx) → SuspenseResumed + UserPageSkeleton mount → StoreRerender with data

Network timeline showed /api/user fetch doesn't start until 9172ms (~580ms after click), proving the client-side fetch only begins after the component mounts

Source code confirmed useUser() uses useQuery while useUserSuspense() using useSuspenseQuery exists but is unused

File comparison showed users/[id]/loading.tsx exists but orders/[id]/loading.tsx and products/[id]/loading.tsx do not — explaining why this bug is unique to the User page

5. Remaining Concerns

Waterfall still exists with useSuspenseQuery: The /api/user fetch still won't start until the client component mounts after RSC resolution. During the Suspense phase, the loading.tsx skeleton shows while the RSC payload loads, then Suspense resolves, UserPage mounts, useSuspenseQuery triggers the fetch, and Suspense suspends again while the fetch completes. This means the loading.tsx skeleton appears for the full duration (RSC + API fetch), which is better UX (one consistent skeleton) but the fetch itself is still sequential after RSC.

Ideal optimization would be server-side prefetching: The best Next.js pattern would be to prefetch user data on the server (e.g., using TanStack Query's prefetchQuery in the server component at users/[id]/page.tsx) and dehydrate it. This would eliminate the client-side fetch waterfall entirely — the data would already be in the cache when useSuspenseQuery runs, so no second Suspense suspension would occur.

loading.tsx skeleton fidelity: The current loading.tsx skeleton is less detailed than UserPageSkeleton. After the fix, users will see the simpler skeleton for the entire loading duration, which might feel like a regression in loading UI quality. Consider updating loading.tsx to match the richer layout.

Bug #2 Agent #3: App Architecture Summary

Agent #3 was given the same investigative steps, but also a few bullet points summarizing the known packages and structure of this app (Next.js + TanStack Query, mix of RSCs and hook loading).

Most of the investigative flow was the same: RecordingOverview, interactions, screenshots, sources, network requests. It then specifically browsed most of the query hook definitions and confirmed that Suspense-enabled query hooks existed, as well as checking more screenshots to confirm the loading skeleton visuals.

This agent came up with the fastest response. It still recommended useSuspenseQuery, but did also list server prefetching as a follow-up step. It also flagged OrderPage and ProductPage as having the same problematic pattern, and that the UserPageSkeleton component would be dead and could be removed.

Total time: 4 minutes

Bug #3 Agent #3 Prompt

💡

Task: Debug a Bug in a Replay Recording

Recording ID: c832545f-a41b-4cd0-9e59-e229b7ca0b7f

You have access to a Replay.io recording of a web application that has a bug.

Bug report from user:

When I'm on the root page and navigate to the User page, I see two different loading skeletons appear. However, when I just refresh the User page, I see only one. Why is that and how to fix it?

Your Mission

Investigate the recording to find the root cause and suggest a complete fix.

Tools Available

Replay MCP tools for analyzing the recording (use replay_ prefixed tools)

The application source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't accessible in the recording

Application Context

Pre-analysis of the repository shows:

Framework: Next.js 14+ using the App Router (app/ directory)

Data fetching: TanStack Query (React Query) for client-side data management

Component model: Mix of React Server Components (RSC) and Client Components

Loading states: Route-level loading.tsx files provide framework-managed loading UI

Route structure: The User Profile page is at src/app/users/[id]/, with its own loading.tsx

Query hooks: The app defines both standard and suspense-aware variants of its query hooks (e.g. useUser and useUserSuspense)

Investigation Process

Follow this structured approach:

Establish the user experience. Use screenshots and interaction timeline to understand exactly what happens visually in both scenarios (navigation vs refresh). Pin down timestamps.

Map the data flow. Identify what data the page needs, where it comes from, when fetches start relative to navigation, and what components consume the results. Use network requests + source code to build this picture.

Build the render timeline. Use React render analysis to understand what components mount, unmount, and re-render — in what order, triggered by what. Compare the component render sequence between the two scenarios if possible.

Compare the two scenarios explicitly. Navigation and refresh produce different results. What's concretely different in the execution path? Don't just explain one path — explain why they diverge.

Find the proximate cause, then go deeper. After identifying the immediate code-level issue, ask yourself: "Does my proposed fix fully resolve the problem, or does it just reduce the symptom?" Consider:

Does the fix eliminate unnecessary work, or just hide it?

Is there a sequential dependency (waterfall) that shouldn't exist?

Would the fix match how other similar pages in the app work?

Check for patterns. Look at how other pages/routes in the app handle data fetching and loading states. If there's a pattern that works correctly elsewhere, the fix should align with that pattern.

Consider framework idioms. Is the code using the framework's intended data flow patterns? If the framework provides a mechanism for this exact scenario, is it being used? Pay particular attention to the boundary between server and client — where does data originate, and is it being passed efficiently across that boundary?

Red-team your conclusion. Before presenting your fix, challenge it: What would still be suboptimal after applying it? Is there a remaining waterfall or unnecessary loading state? Would an expert reviewing this fix say "yes, but you should also..."?

Deliverable

Write up your findings:

Bug description — what the user experiences in each scenario

Root cause analysis — the specific code/logic/architectural error, with evidence from the recording

Suggested fix — concrete code change(s). If there are multiple layers to the fix (immediate + architectural), describe both.

Key evidence trail — the chain of observations that led to your diagnosis

Remaining concerns — anything that your fix doesn't address, or that you're uncertain about

Bug #2 Agent #3 Solution

💡

Bug Report: Double Loading Skeleton on Navigation to User Page

1. Bug Description

What the user experiences:

Navigation (root → User page): Two visually distinct loading skeletons appear in sequence:

Skeleton #1 (~130ms, 9063-9187ms): A dashboard-style skeleton with 4 stat card placeholders and a content grid — this is the root app/loading.tsx fallback, completely wrong for the user page context.

Skeleton #2 (~530ms, 9187-9720ms): A user profile card skeleton with avatar circle and name bars — this is the UserPageSkeleton rendered inside UserPage while useUser() fetches data.

Hard refresh on User page: Only one skeleton appears (the UserPageSkeleton from useUser()'s loading state). The transition is clean.

2. Root Cause Analysis

The problem has two layers:

Layer 1: Mismatched Suspense boundary (wrong skeleton appears)

During client-side navigation, Next.js lazy-loads the page component's JS chunk. While the chunk for app/users/[id]/page.js downloads (~323ms), React suspends. The framework looks for the nearest Suspense boundary with a fallback — but the route-level app/users/[id]/loading.tsx hasn't loaded yet either. So React falls back to app/loading.tsx (the root loading file), which shows a dashboard-style skeleton for a user profile page. This is visually jarring and contextually wrong.

Evidence from React commits:

Commit 135 (8.89s): RenderSuspended — React.lazy init for page.js chunk. 145 DOM mutations — this is the heavy swap to the root skeleton.

Commit 136 (9.02s): SuspenseResumed — chunk loaded, UserPage mounts.

Layer 2: Non-suspense data fetching creates a second loading phase

UserPage uses useUser() — a standard useQuery hook that returns { isLoading: true } initially. The component manually checks isLoading and renders <UserPageSkeleton />. This creates a sequential waterfall:

plain text
Navigation click (8.59s)
  → RSC navigation (8.67s)
    → Page JS chunk download (323ms)
      → Root skeleton shown (Skeleton #1)
        → UserPage mounts (9.02s)
          → useUser() starts fetch
            → UserPageSkeleton shown (Skeleton #2, 500ms API delay)
              → Data arrives, final render (9.72s)

On hard refresh, the JS chunk is included in the initial SSR payload, so the chunk-loading Suspense phase is skipped entirely. The user only sees the useUser() loading state.

Evidence from network timeline:

/api/user fetch doesn't start until 9172ms — 505ms after the navigation click

The 500ms API delay is on top of the chunk-loading waterfall

3. Suggested Fix

The app already has the correct solution defined but unused. Every query hook file exports a useSuspenseQuery variant. The fix is to use it.

Primary fix: Switch `UserPage` to use `useUserSuspense()`

src/components/pages/UserPage.tsx:

typescript
// Change import:
- import { useUser } from '@/queries/useUser';
+ import { useUserSuspense } from '@/queries/useUser';

// Change hook usage in UserPage component:
- const { data: user, isLoading, error } = useUser();
-
- if (isLoading) {
-   return <UserPageSkeleton />;
- }
-
- if (error) {
-   throw error;
- }
-
- if (!user) {
+ const { data: user } = useUserSuspense();
+
+ if (!user) {

This change:

Makes data fetching participate in React Suspense

Next.js's app/users/[id]/loading.tsx becomes the single loading boundary for both chunk loading AND data fetching

The user sees one contextually-correct skeleton (user profile card) instead of two mismatched ones

The UserPageSkeleton inline component and isLoading check become unnecessary (can be removed)

Error handling is automatic — useSuspenseQuery throws on error, which React error boundaries catch

Same pattern applies to other pages

OrderPage and ProductPage have the identical issue. They should also switch to their suspense variants (useOrderSuspense, useProductSuspense).

4. Key Evidence Trail

Screenshots at 9063ms vs 9187ms showed two visually distinct skeletons in rapid succession

React render commits 135-137 confirmed the three-phase sequence: suspend (chunk) → resume (mount + fetch) → data arrives

Network timeline showed /api/user starting at 9172ms, 505ms after click — confirming the waterfall

Source code revealed useUser() (non-suspense) being used despite useUserSuspense() being defined in the same file

app/loading.tsx confirmed the root skeleton is a dashboard layout — wrong for the user page context

All query hooks export unused suspense variants — the codebase was designed for this pattern but didn't follow through

5. Remaining Concerns

Other pages: OrderPage and ProductPage use the same anti-pattern (useOrder/useProduct + isLoading + manual skeleton). They'll exhibit the same double-skeleton bug on client-side navigation. The fix should be applied consistently.

Server-side prefetching: Even with useSuspenseQuery, the data fetch still starts client-side after the component mounts. The ideal Next.js pattern would be to prefetch on the server (in the RSC page component) and dehydrate into the client, so the TanStack Query cache is warm before the client component mounts. This would eliminate the data-fetching loading state entirely on navigation. But switching to useSuspenseQuery is the correct first step — it fixes the double-skeleton bug and properly integrates with framework loading boundaries.

UserPageSkeleton component: After the fix, the inline UserPageSkeleton in UserPage.tsx becomes dead code. The route-level app/users/[id]/loading.tsx handles all loading UI. The inline skeleton should be removed to avoid confusion.

Bug #2: Initial Results

After running those three sessions, I was pleased to see that the agents had at least found the initial problems and identified the same useSuspenseQuery fix that Nadia’s agent had suggested. But, as she pointed out, that “fix” is incomplete and actually leads to hydration errors.

Meanwhile, it was clear that adding more codebase context and better instructions produces higher quality answers and faster results. Even just adding a few bullet points describing the rough loading patterns used in the app let Agent #3 zero in on the problem much faster.

So, what if I tried giving the agent more info on the actual technologies used in this app? More specifically, what if I gave it access to the official SKILL.md files for Next.js and TanStack Query? How would that change the results? Would the agent be able to identify the real root cause and suggest server prefetching as an actual solution?

Bug #2 Agent #4: Skill Files

I spun up Agent #4 with the same instructions as before (bug report, investigation steps, app context), but added in two extra sections: “CI context” saying this was a manual repro of a known E2E test failure and listing project dependencies, and specific pointers to the SKILL files for TanStack Query, Next.js App Router data fetching, Next.js Suspense boundaries, and Vercel’s “React best practices”.

This agent started by actively reading the skill files. From there, most of the investigation followed the same paths: Replay MCP tools to scan the recording, get screenshots, network requests, sources, and React renders. It also cross-checked all of the query hook files and the different app pages.

However, this time the result was different. While it still recommended useSuspenseQuery as the primary fix, the agent explicitly recommended “Deeper fix: eliminate the waterfall with server-side prefetching” and gave a copy-pasteable example showing use of queryClient.prefetchQuery() in the server component that exactly matched Nadia’s recommended fix! That is the correct answer!

This agent also identified several other concerns, including the Order and Product pages having the same issue, the query key not including userId, and other loading state issues.

Total time: 5 minutes

Bug #2 Agent #4 Prompt

💡

Task: Debug a Bug in a Replay Recording

Recording ID: c832545f-a41b-4cd0-9e59-e229b7ca0b7f

You have access to a Replay.io recording of a web application that has a bug.

Bug report from user:

When I'm on the root page and navigate to the User page, I see two different loading skeletons appear. However, when I just refresh the User page, I see only one. Why is that and how to fix it?

Your Mission

Investigate the recording to find the root cause and suggest a complete fix.

Tools Available

Replay MCP tools for analyzing the recording (use replay_ prefixed tools)

The application source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't accessible in the recording

CI Context

This was reported from a navigation E2E test for the User Profile page that detected unexpected loading state behavior. The recording captures a manual reproduction of the issue.

Key dependencies:

Next.js ^15.5.12 (App Router)

React ^19

@tanstack/react-query ^5.90.16

TypeScript ^5.3.0

Application Context

Pre-analysis of the repository shows:

Framework: Next.js 14+ using the App Router (app/ directory)

Data fetching: TanStack Query (React Query) for client-side data management

Component model: Mix of React Server Components (RSC) and Client Components

Loading states: Route-level loading.tsx files provide framework-managed loading UI

Route structure: The User Profile page is at src/app/users/[id]/, with its own loading.tsx

Query hooks: The app defines both standard and suspense-aware variants of its query hooks (e.g. useUser and useUserSuspense)

Technology Skills

You have been provided with best-practices skills for the technologies in this app. Read the smaller skill files before starting your investigation — they contain framework-specific patterns and idioms that may be relevant:

~/projects/temp/react-skills/skills/tanstack-query.md — TanStack Query patterns including SSR, hydration, and query hook variants

~/projects/temp/react-skills/skills/next-data-patterns.md — Next.js App Router data fetching patterns and decision trees

~/projects/temp/react-skills/skills/next-suspense-boundaries.md — Suspense boundary and CSR bailout rules

~/projects/temp/react-skills/skills/react-best-practices.md — Comprehensive React best practices (LARGE file, ~3800 lines). Do NOT read this fully — scan headers with rg "^#{1,3} " first, then read only sections relevant to the issue you're investigating.

Investigation Process

Follow this structured approach:

Establish the user experience. Use screenshots and interaction timeline to understand exactly what happens visually in both scenarios (navigation vs refresh). Pin down timestamps.

Map the data flow. Identify what data the page needs, where it comes from, when fetches start relative to navigation, and what components consume the results. Use network requests + source code to build this picture.

Build the render timeline. Use React render analysis to understand what components mount, unmount, and re-render — in what order, triggered by what. Compare the component render sequence between the two scenarios if possible.

Compare the two scenarios explicitly. Navigation and refresh produce different results. What's concretely different in the execution path? Don't just explain one path — explain why they diverge.

Find the proximate cause, then go deeper. After identifying the immediate code-level issue, ask yourself: "Does my proposed fix fully resolve the problem, or does it just reduce the symptom?" Consider:

Does the fix eliminate unnecessary work, or just hide it?

Is there a sequential dependency (waterfall) that shouldn't exist?

Would the fix match how other similar pages in the app work?

Check for patterns. Look at how other pages/routes in the app handle data fetching and loading states. If there's a pattern that works correctly elsewhere, the fix should align with that pattern.

Consider framework idioms. Is the code using the framework's intended data flow patterns? If the framework provides a mechanism for this exact scenario, is it being used? Pay particular attention to the boundary between server and client — where does data originate, and is it being passed efficiently across that boundary?

Red-team your conclusion. Before presenting your fix, challenge it: What would still be suboptimal after applying it? Is there a remaining waterfall or unnecessary loading state? Would an expert reviewing this fix say "yes, but you should also..."?

Deliverable

Write up your findings:

Bug description — what the user experiences in each scenario

Root cause analysis — the specific code/logic/architectural error, with evidence from the recording

Suggested fix — concrete code change(s). If there are multiple layers to the fix (immediate + architectural), describe both.

Key evidence trail — the chain of observations that led to your diagnosis

Remaining concerns — anything that your fix doesn't address, or that you're uncertain about

Bug #2 Agent #4 Solution

💡

Bug Report: Double Loading Skeleton on User Page Navigation

1. Bug Description

What the user sees:

Navigating to the User page (e.g., from dashboard): Two visually distinct loading skeletons appear in sequence — a simpler card skeleton (~130ms), then a more detailed profile skeleton (~700ms) — before the actual content renders.

Refreshing the User page directly: Only one skeleton (the detailed profile skeleton) appears, then content renders.

The two skeletons have different layouts (different card structure, different number of skeleton elements), creating a jarring visual transition during navigation.

2. Root Cause Analysis

The bug stems from two independent, overlapping loading mechanisms competing for the same UI slot:

Loading mechanism #1: Next.js route-level `loading.tsx`

src/app/users/[id]/loading.tsx provides a Suspense fallback for the route segment. During client-side navigation, Next.js lazy-loads the page chunk via React.lazy. While the JS chunk is downloading (323ms), React suspends and shows this loading.tsx skeleton.

Loading mechanism #2: Component-level `isLoading` check

UserPage (a 'use client' component) uses useUser() — which is standard useQuery, NOT useSuspenseQuery. This means:

The component renders immediately with isLoading: true

It manually returns <UserPageSkeleton /> during loading

When the fetch completes, it re-renders with actual data

Why navigation shows two skeletons:

Time	Event	Visible
8588ms	User clicks navigation link
8667ms	RSC navigation fetch starts
8764ms	Page JS chunk fetch starts
8890ms	React.lazy suspends → `loading.tsx` shown	Skeleton #1 (route-level)
9020ms	Chunk loaded, `UserPage` mounts, `useUser()` fires
9020ms	`isLoading: true` → `UserPageSkeleton` shown	Skeleton #2 (component-level)
9172ms	`/api/user` fetch actually starts
9720ms	Data arrives, content renders	User profile

Why refresh shows only one skeleton:

On a full page load, the page JS is served as part of the initial HTML/SSR bundle. There's no lazy-loading, so loading.tsx never triggers. UserPage mounts immediately, useUser() returns isLoading: true, and only the component-level UserPageSkeleton appears.

The architectural mismatch:

The real issue is that UserPage uses useQuery (non-suspense) while also having a route-level loading.tsx (which implies Suspense-based loading). These are two different loading paradigms:

loading.tsx = "the framework manages loading via Suspense boundaries"

useQuery + manual isLoading check = "the component manages its own loading state"

Using both creates the double-skeleton problem during navigation.

3. Suggested Fix

Immediate fix: Switch to `useSuspenseQuery`

The app already defines useUserSuspense() — it's just not being used. The fix is to use it, which unifies the loading mechanism under Suspense:

src/components/pages/UserPage.tsx:

typescript
// BEFORE:
import { useUser } from '@/queries/useUser';

export function UserPage({ userId }: UserPageProps) {
  const { data: user, isLoading, error } = useUser();

  if (isLoading) {
    return <UserPageSkeleton />;
  }

  if (error) {
    throw error;
  }
  // ...
}

// AFTER:
import { useUserSuspense } from '@/queries/useUser';

export function UserPage({ userId }: UserPageProps) {
  const { data: user } = useUserSuspense();
  // No isLoading check needed — Suspense handles it
  // No error check needed — useSuspenseQuery throws to error boundary

  if (!user) {
    return (/* user not found UI */);
  }
  // ... rest of the render
}

With useSuspenseQuery:

During navigation, when UserPage mounts, the query suspends → React shows the same loading.tsx fallback that was already showing for the chunk load. The user sees ONE continuous skeleton.

During refresh, the query suspends on first render → loading.tsx shows ONE skeleton.

Both paths now show the same single skeleton.

The UserPageSkeleton component and the isLoading branch can be removed entirely.

Deeper fix: Eliminate the waterfall with server-side prefetching

Even after the immediate fix, there's still a waterfall: the /api/user fetch only starts after the page JS loads and UserPage mounts (the fetch started at 9172ms, ~500ms after navigation). The ideal pattern in Next.js App Router is to prefetch data on the server and hydrate it:

src/app/users/[id]/page.tsx:

typescript
import { HydrationBoundary, dehydrate } from '@tanstack/react-query';
import { getQueryClient } from '@/providers/getQueryClient';
import { UserPage } from '@/components/pages/UserPage';

interface PageProps {
  params: Promise<{ id: string }>;
}

export default async function Page({ params }: PageProps) {
  const { id } = await params;
  const queryClient = getQueryClient();

  // Prefetch on the server — starts immediately with the RSC request
  await queryClient.prefetchQuery({
    queryKey: ['user'],
    queryFn: async () => {
      // Direct server-side data fetch (no /api roundtrip needed)
      const { getUser } = await import('@/lib/data');
      return getUser();
    },
  });

  return (
    <HydrationBoundary state={dehydrate(queryClient)}>
      <UserPage userId={id} />
    </HydrationBoundary>
  );
}

This eliminates the waterfall by fetching data in parallel with (or even before) the client JS chunk load. When UserPage mounts with useSuspenseQuery, the data is already in the cache — no suspension, no skeleton at all during navigation.

4. Key Evidence Trail

Screenshots at 8829ms vs 9063ms — Two visually distinct skeletons confirmed on screen during navigation

React render commit 135 (8.89s) — RenderSuspended trigger: "React.lazy component initialized" → loading.tsx shown (skeleton #1)

React render commit 136 (9.02s) — SuspenseResumed, UserPage mounts, renders UserPageSkeleton due to isLoading: true (skeleton #2)

React render commit 137 (9.72s) — StoreRerender from TanStack Query → data arrives, profile renders

Network timeline — /api/user fetch starts at 9172ms, 500ms after navigation click, confirming client-side waterfall

Source code — UserPage uses useUser() (standard useQuery) not useUserSuspense(), despite useUserSuspense being defined and available

Pattern comparison — OrderPage and ProductPage have the same useQuery + manual isLoading pattern but do NOT have loading.tsx files, so they only show one skeleton. The user/[id] route is the only one with both mechanisms.

5. Remaining Concerns

OrderPage and ProductPage have the same anti-pattern (just less visible). They use useQuery + manual loading when they could use useSuspenseQuery + route-level loading.tsx for a more consistent architecture. The same waterfall exists for those routes — the API fetch only starts after the component mounts.

The useUser() query key ['user'] doesn't include userId. This means all user pages share the same cache entry regardless of which user ID is in the URL. The userId prop is passed to UserPage but never used in the query. This is likely a separate bug.

No server-side prefetching anywhere in the app. The getQueryClient.ts already has dehydrate configuration including shouldDehydrateQuery for pending queries, suggesting prefetching was planned but never implemented.

The loading.tsx skeleton and UserPageSkeleton show different layouts. Even after fixing the double-skeleton issue, you should decide which skeleton design you want and consolidate. If using the Suspense approach, only loading.tsx matters.

Bug #2 Conclusions

I like Nadia’s rubric for measuring success, but I think we can expand it a bit. There’s both “did it come up with the right answers?”, and “how did it get those answers?”. (This is where I always struggled as a kid in school - I wrote the right answers, why should I have to write down my work to show how I got there? 🤣)

Looking at the four agents, here’s how we’d grade their results:

	Run 1 (baseline)	Run 2 (+methodology)	Run 3 (+context)	Run 4 (+skills)
Did it fix the problem?	✅	✅	✅	✅
Did it identify the root cause correctly?	⚠️ Proximate only	⚠️ Proximate only	⚠️ Proximate only	✅
Did it investigate beyond the first fix?	❌	✅	✅	✅
Did it check the rest of the codebase?	❌	Partial	✅	✅
Did it produce the correct architectural fix?	❌	❌	❌	✅

The “fix” rating for agents 1-3 is a bit generous given that we know that useSuspenseQuery wasn’t the complete answer. Still, it does address the immediate issue, and if that change was pushed in a PR, the next test run would have had the hydration error and eventually led to implementing the prefetching solution. So, call that a step in the right direction - it’s investigative progress that you as a developer didn’t have to do yourself.

I was impressed to see that just adding a detailed investigative process in the prompt produced better results, faster. I was very impressed to see how much of a difference adding the skills files made! I honestly didn’t know if that would help, but figured it was worth trying for comparison, and it actually did lead to a full and complete architectural fix suggestion.

Bug #3: Next.js Redirect Error

Nadia’s third bug investigation involves a redirect error. Load /users, it does correctly redirect to /users/1 so that non-admins see their own user details. However, there’s a Rendered more hooks than during the previous render React error that shows up.

Nadia’s LLM prompt was again pretty simple:

💡

When I navigate to the /users page, I see a "Something went wrong" page, and in the console I see the following error: [error code]. Investigate why the error happens and how to fix it.

She reported that her LLM completely failed in this case. It suggested a half dozen causes and “believably detailed” potential solutions, all completely wrong.

Nadia had to dig through Next’s Github issues, found some semi-relevant discussions, but no firm fix. Eventually she started ripping out code from her app, and after “multiple days”, found that the <SendAnalyticsData> component was calling a Server Action function inside of a useEffect . Apparently this was actually due to a combination of a server-side redirect, and a Server Action in progress wrapped in Suspense (per this Next.js issue ). That’s incredibly obscure - no wonder her agent couldn’t figure it out!

I created a fresh Replay recording of the bug and set up the same parallel agent sessions as before. This time, the question was: would Replay MCP would enable any of my agent sessions to even come close to describing the real bug and fix? Or would this also be too obscure even with runtime data available?

Bug #3 Agent #0: No Recording

I tried doing one agent session with just access to the source to see what would happen. Interestingly, it did label <SendAnalyticsData> was the issue, but came up with a completely bogus explanation - it claimed it had to do with component position after an early return, and said it should be moved earlier in the return logic. Clearly the source code alone wasn’t sufficient to explain what was going on here.

Bug #3 Agent #1: Basic Instructions

Agent #1 went on an incredibly deep spelunking expedition through Next and React’s internals :)

It started with the usual RecordingOverview tool, which immediately surfaced the error message. It got more details via the ReactExceptions and ConsoleMessages tools, identified the Router component as the source of the thrown error, and dug into React’s behavior with the ReactRenders tool.

After reading all of the relevant app source files, it read the source of Route via the sources tools, saw the hit counts, and jumped from there to Next’s useActionQueue internal hook. It identified all the internal hook calls and came up with 11 hooks, and flagged the line return isThenable(state) ? use(state) : state as being relevant. It then used multiple Logpoint and Evaluate calls to trace more runtime execution inside of React internal methods like updateWorkInProgressHook().

The final recommended fix was to move the redirect into middleware. This would sort of resolve the problem, but only as a bandaid - it doesn’t address the Server Action being called in a useEffect .

Still, it’s incredibly impressive how deep inside React’s internals this agent was able to investigate and how much of the actual behavior it identified in the process!

Total time: 15 minutes

Bug #3 Agent #1 Prompt

Bug #3 Agent #1 Solution

💡

Bug Report: "Rendered more hooks than during the previous render" on `/users` page

1. Bug Description

When navigating to /users, users briefly see a "Something went wrong" error page (rendered by Next.js's DefaultGlobalError component) for approximately 2 seconds before being redirected to /users/1. The browser console shows:

Error: Rendered more hooks than during the previous render.

The redirect itself is intentional — the server component at src/app/users/page.tsx fetches the current user and calls redirect(/users/${user.id}). The error flash is the bug.

2. Root Cause Analysis

This is a bug in Next.js's Router component's interaction with React's use() hook when processing server-side redirects. Here's the exact chain of events:

Step 1: Server Component throws NEXT_REDIRECT

src/app/users/page.tsx calls redirect(), which throws a NEXT_REDIRECT error. This is caught by Next.js's RedirectErrorBoundary, which renders <HandleRedirect>. That component calls router.replace(url) inside a useEffect, dispatching a navigation action.

Step 2: Router's useActionQueue stores a Promise in useState

The navigation action puts a Promise into actionQueue.state. When the Router re-renders, useActionQueue (in use-action-queue.ts) calls use(state) at line 53 where state is the Promise from useState. This causes the render to suspend.

Step 3: Suspended render saves partial hook state

Before reaching use(state), the Router calls these hooks inside useActionQueue:

useState — 1 hook

useAppDevRenderingIndicator() → useTransition() (2 hooks) + useEffect() (1 hook) — 3 hooks

Total: 4 hooks. Then use(state) suspends. React saves these 4 partial hooks on the work-in-progress fiber.

Step 4: Suspended fiber becomes the alternate

When React retries the render, the suspended fiber (with only 4 hooks) becomes the alternate (the "previous render" that React compares against).

Step 5: use() resolves → dispatcher set incorrectly

When the Promise resolves, use(state) returns a value. React's useThenable function (react-dom-client line 7004-7013) checks whether to set the Mount or Update dispatcher for subsequent hooks:

javascript
// Simplified from React source
if (workInProgressHook.next === null) {
  // No more WIP hooks after use()
  if (alternate !== null && alternate.memoizedState !== null) {
    // Alternate has hooks → use UPDATE dispatcher
    ReactSharedInternals.H = HooksDispatcherOnUpdateInDEV;
  } else {
    // No alternate hooks → use MOUNT dispatcher
    ReactSharedInternals.H = HooksDispatcherOnMountInDEV;
  }
}

The check alternate.memoizedState !== null is too coarse. The alternate does have hooks (its memoizedState is non-null — it has 4 hooks), but all 4 of those hooks have already been consumed during the current render. There are zero remaining hooks to update against. React should set the Mount dispatcher here, but instead sets the Update dispatcher.

Step 6: Next hook call throws

The Router's useMemo at line 207 (immediately after useActionQueue returns) calls updateWorkInProgressHook (the Update-mode hook handler), which does:

javascript
nextCurrentHook = currentHook.next; // null — no more alternate hooks!
if (nextCurrentHook === null) {
  throw Error("Rendered more hooks than during the previous render.");
}

Step 7: Error boundary catches, shows error page

ErrorBoundaryHandler catches this error and renders DefaultGlobalError — the "Something went wrong" page the user sees. The redirect eventually completes via a separate code path, navigating to /users/1.

3. Suggested Fix

The root cause is in React's useThenable implementation, making it a React/Next.js framework bug. However, the app can work around it.

App-level fix — move redirect to middleware:

Create src/middleware.ts:

typescript
import { NextRequest, NextResponse } from 'next/server';

export async function middleware(request: NextRequest) {
  if (request.nextUrl.pathname === '/users') {
    // In a real app, read the user ID from session/cookie
    const userId = 1; // or fetch from auth
    return NextResponse.redirect(new URL(`/users/${userId}`, request.url));
  }
}

export const config = {
  matcher: '/users',
};

This performs the redirect at the HTTP level (301/302) before any React rendering occurs, completely avoiding the client-side Router render cycle that triggers the bug.

Alternative app-level fix — use permanentRedirect or notFound pattern:

Replace the server component redirect with a client-side approach:

typescript
// src/app/users/page.tsx
import { redirect } from 'next/navigation';
import { getUser } from '@/services/user';

export default async function UsersPage() {
  const user = await getUser();
  // Use permanentRedirect for HTTP-level redirect
  redirect(`/users/${user.id}`);
}

If the above still triggers the same issue (since both redirect and permanentRedirect use the same NEXT_REDIRECT mechanism), the middleware approach is the most reliable workaround.

Framework-level fix (in React's useThenable):

The check at react-dom-client.development.js line ~7004 should verify whether the alternate has unconsumed hooks remaining, not merely whether it has any hooks at all:

javascript
// Current (buggy):
if (alternate !== null && alternate.memoizedState !== null) {

// Fixed — check if currentHook (the tracking pointer into the alternate)
// has a .next, meaning there are unconsumed hooks:
if (currentHook !== null && currentHook.next !== null) {

This would correctly detect that all alternate hooks have been consumed and set the Mount dispatcher, allowing subsequent hooks (useMemo, etc.) to mount fresh rather than trying to update against non-existent previous hooks.

4. Key Evidence Trail

Step	Tool	Finding
1	`RecordingOverview`	1 console error at 1050ms: "Rendered more hooks than during the previous render"
2	`ReactRenders` (summary)	4 commits; commit 1 at 1.05s renders `DefaultGlobalError` — the error flash
3	`ReactException`	Two errors: NEXT_REDIRECT (caught by RedirectErrorBoundary) and hooks error (caught by ErrorBoundaryHandler)
4	`ConsoleMessages` (detail)	Error stack points to `updateWorkInProgressHook` in react-dom-client.development.js:6979
5	`ReadSource` (app-router.tsx)	Router component has 13+ hooks; `useActionQueue` is first, calls `use(state)` conditionally
6	`ReadSource` (use-action-queue.ts)	Line 53: `return isThenable(state) ? use(state) : state` — conditional `use()` call
7	`Logpoint` (use-action-queue.ts:29)	Confirmed `state` from useState was a thenable (Promise) during renders at 999ms-1042ms
8	`Logpoint` (app-router.tsx:207)	`useMemo` NOT reached during suspended renders (999ms, 1007ms, 1041ms) — only 4 hooks saved
9	`Logpoint` (app-router.tsx:439)	At 1042ms: useMemo IS reached but line 439 is NOT — render throws before completing
10	`Evaluate` (at useThenable Point:130)	`workInProgressHook.next === null`, `alternate.memoizedState !== null` (4 hooks), all consumed — dispatcher set to Update incorrectly
11	`DescribePoint` (Point:114)	Confirmed alternate fiber had exactly 4 hooks in memoizedState chain at the moment of the throw
12	`ReadSource` (use-app-dev-rendering-indicator.tsx)	`useTransition()` + `useEffect()` = 3 hooks, explaining the 4-hook count (useState + 3)
13	Source code (`src/app/users/page.tsx`)	Server component calling `redirect()` — the trigger for the entire chain

Bug #3 Agent #2: Investigative Methodology

Agent #2 was given the same 8-step investigative approach we used earlier with Bug #2.

And Agent #2 actually nailed the exact React behavior and suggested fix!

Agent #2 started with RecordingOverview, saw the Next redirect error, and took a few screenshots to confirm visuals. It then used ReactRenders to dig into the specific React commits around the redirect.

After using SearchSources and ReadSources to read the full Router component and its hit counts, it used ReactRenders again to get the specific React fiber ID for the Router instance. It identified the render cause as a setState call, then used DescribePoint to backtrack the cause of the setState.

As with Agent #1, it traced down into useActionQueue, then read much of the app source. This time, though, it found <SendAnalyticsData>, and immediately flagged sendAnalyticsData as a Server Action. It used a mixture of source reads + hit counts, logpoints, and dependency graph checks to confirm the timing of the calls, and concluded the server response came back while the Router was still mid-redirect.

The suggested fix was indeed replacing the Server Action with a simple fetch call for the analytics request.

Total time: 12 minutes

Bug #3 Agent #2 Prompt

💡

You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.

Bug Report

When I navigate to the /users page, I see a "Something went wrong" page briefly flash, and then it redirects to /users/1. In the browser console, I see the following error: "Rendered more hooks than during the previous render."

The redirect itself is expected — the intention is to redirect non-admin users so they can only see their own profile. But the error flash is not expected and needs to go away.

Server logs are clean, so this is a frontend issue.

Recording

The bug has been captured in Replay recording 20da60dd-0831-491c-a7a8-ea7afe1e1af2. Use Replay MCP tools to investigate what's happening in this recording.

Source Code

The app source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't available in the recording's source maps.

Investigation Methodology

Follow this structured 8-step investigation process:

Establish user experience — Take screenshots at key moments to see exactly what the user sees. Build a visual timeline of the bug.

Map data flow — Identify all network requests, their timing, and what components consume the responses. Look for waterfalls or unexpected ordering.

Build render timeline — Use React analysis tools to see the sequence of component renders, mounts, unmounts. Look for unexpected render patterns, error boundaries catching, Suspense transitions.

Compare scenarios explicitly — The bug says "navigating to /users shows an error flash then redirects." Compare what happens during this navigation vs what happens if you directly load /users/1. What's different?

Find proximate cause, then go deeper — Once you find what immediately causes the error, ask "but WHY does this happen?" Keep going deeper. The first cause you find is rarely the root cause. Evaluate whether your fix addresses just the symptom or the actual root cause.

Systematic elimination — If the error source isn't clear from tracing, try identifying which components are on the page and reasoning about which one might be causing the issue. The "hooks" error means a component is rendering with a different number of hooks between renders — which component could be conditionally calling hooks? Look at layout components, providers, and components that appear on multiple pages.

Consider framework idioms — Is the code using Next.js/React/TanStack patterns correctly? Are there known interactions between features (Suspense, redirects, Server Actions, etc.) that could cause this?

Red-team your conclusion — Before finalizing, critically evaluate: Does your fix actually address the root cause? Could it introduce new problems? Are there other components or pages with the same pattern?

Deliverable

Write a detailed bug report with:

Bug Description — What the user sees

Root Cause Analysis — What exactly causes the error, traced through the code

Suggested Fix — Concrete code changes to fix the root cause

Key Evidence Trail — Table of tools used and findings at each step

Bug #3 Agent #2 Solution

💡

Bug Report: "Something went wrong" flash on `/users` redirect

1. Bug Description

When navigating to /users, a "Something went wrong" error page briefly flashes (~50ms, visible from ~1015ms to ~3050ms in the recording) before the expected redirect to /users/1 completes. The console logs: "Rendered more hooks than during the previous render." The error is caught by Next.js's global error boundary (ErrorBoundaryHandler), which renders global-error.tsx (the "Something went wrong" page). The redirect eventually succeeds, but the UX flash is jarring.

2. Root Cause Analysis

The chain of events:

User navigates to /users. The server component UsersPage (/app/users/page.tsx) calls await getUser() then redirect('/users/${user.id}'), which throws a NEXT_REDIRECT error.

Commit #0 (~1.00s): Next.js's RedirectErrorBoundary catches the NEXT_REDIRECT. It renders HandleRedirect, which calls router.replace('/users/1') inside a useEffect + React.startTransition. This commit also mounts the layout tree, including SendAnalyticsData components.

Layout effects fire after commit #0:

SendAnalyticsData with value="root-layout" (from /app/layout.tsx) fires its useEffect, calling the server action sendAnalyticsData("root-layout")

SendAnalyticsData with value="users-layout" (from UserLayout.tsx via /app/users/layout.tsx) fires its useEffect, calling sendAnalyticsData("users-layout")

Server actions trigger Router state updates: Each server action call goes through Next.js's callServer → dispatchAppRouterAction → actionQueue.dispatch() → setState(deferredPromise) on the Router's internal state. This sets the Router's useActionQueue state to a Promise (thenable).

The HandleRedirect effect also fires, calling router.replace('/users/1') which dispatches a navigate action — another setState(promise) on the Router.

Commit #1 (~1.05s): The Router re-renders. Now its useActionQueue hook encounters isThenable(state) === true, so it calls use(state) (React 19's use() API). The use() call suspends because the promise is still pending. However, the interaction between:

Multiple concurrent setState calls (from 2 server actions + 1 redirect navigate)

The use() suspension inside useActionQueue

The error boundary recovery from the NEXT_REDIRECT

...causes React to see a different number of hooks between the Router's previous render and this one. The Router throws "Rendered more hooks than during the previous render." This is caught by the top-level ErrorBoundaryHandler, which renders DefaultGlobalError (the "Something went wrong" page from global-error.tsx).

Commit #2 (~3.05s): The promises resolve, Suspense resumes, the redirect to /users/1 completes, and the correct page renders.

The root cause is the SendAnalyticsData components calling server actions during the same render cycle as a redirect. The server actions dispatch state updates to the Router's action queue while the redirect is also dispatching a navigate action. This concurrent mutation of the Router's state during error boundary recovery triggers the hooks mismatch error.

Specifically:

SendAnalyticsData in the root layout (/app/layout.tsx:24) — fires on every page load

SendAnalyticsData in the users layout (UserLayout.tsx:6) — fires when /users/* loads

Both call sendAnalyticsData() (a server action) from useEffect, which pushes promises into the Router's action queue at exactly the wrong time — while a NEXT_REDIRECT is being processed.

3. Suggested Fix

Option A (Recommended): Don't use server actions for analytics — use a regular API call or fire-and-forget fetch instead.

Server actions in Next.js are tightly coupled to the Router's action queue. Using them for fire-and-forget analytics is an anti-pattern because they trigger Router re-renders. Replace with a plain fetch:

typescript
// src/actions/analytics.ts — convert from server action to API utility
// REMOVE: 'use server';
export async function sendAnalyticsData(name: string) {
  // Use a regular fetch to an API route, or just log client-side
  await fetch('/api/analytics', {
    method: 'POST',
    body: JSON.stringify({ name }),
  });
}

Or simpler — just make the analytics call not go through the server action mechanism:

typescript
// src/components/SetupAnalytics.tsx
'use client';

import { useEffect } from 'react';

// Don't import a server action — use a plain function
function sendAnalyticsBeacon(value: string) {
  // navigator.sendBeacon, fetch to an API route, or third-party SDK
  console.log('analytics data sent', value);
}

export function SendAnalyticsData({ value }: { value: string }) {
  useEffect(() => {
    sendAnalyticsBeacon(value);
  }, [value]);

  return <></>;
}

Option B (Alternative): Wrap the server action call in startTransition with lower priority.

This doesn't fix the fundamental issue but may work around the timing:

typescript
'use client';

import { useEffect, useTransition } from 'react';
import { sendAnalyticsData } from '@/actions/analytics';

export function SendAnalyticsData({ value }: { value: string }) {
  const [, startTransition] = useTransition();

  useEffect(() => {
    startTransition(() => {
      sendAnalyticsData(value);
    });
  }, [value, startTransition]);

  return <></>;
}

However, Option B is fragile — it still couples analytics to the Router internals and may break again with future Next.js changes. Option A is the correct fix.

Option A is recommended because:

Analytics should be fire-and-forget, not coupled to the Router's action queue

Server actions are designed for mutations that need to update the UI (form submissions, data writes), not for side-effect-only telemetry

This eliminates the entire class of bugs where analytics firing causes Router state conflicts

4. Key Evidence Trail

Step	Tool	Finding
1. Visual timeline	`Screenshot` at 723ms, 1015ms, 1077ms, 4595ms	Confirmed "Something went wrong" page flashes between ~1015ms and ~3050ms
2. Error identification	`RecordingOverview`	Two error boundary catches: NEXT_REDIRECT at 1005ms, hooks error at 1050ms
3. React render analysis	`ReactRenders` (commits, commit detail)	Commit #0: redirect caught. Commit #1: Router throws hooks error → GlobalError renders
4. React exceptions	`ReactException`	Router fiber #854 threw "Rendered more hooks than during the previous render", caught by ErrorBoundaryHandler
5. Router source analysis	`ReadSource` on app-router.tsx, use-action-queue.ts	Router's `useActionQueue` conditionally calls `use(state)` when state is thenable (Promise)
6. Hit count analysis	`Logpoint` on Router entry, useActionQueue, `use(state)`	8 entries, 7 reached useActionQueue, `use(state)` suspended during hits 3-7 (~999-1046ms)
7. Server action path trace	`ReadSource` on app-call-server.ts, app-router-instance.ts	Server actions → `dispatchAppRouterAction` → `setState(promise)` on Router
8. Trigger identification	`ReactRenders` (trigger-detail, fiber-cause)	Commit #1 triggered by setState from effects fired after commit #0
9. App source review	Source files on disk	`SendAnalyticsData` calls server action from `useEffect`, present in root layout + users layout
10. Timing correlation	`Logpoint` on sendAnalyticsData calls	"root-layout" fires at 972ms, "users-layout" fires at 1005ms — same window as NEXT_REDIRECT

Bug #3 Agent #3: App Context and Skills

Agent #3’s instructions added some basic context about the app setup, and access to the same React / Next / TanStack skill files we used earlier.

And yet it did worse than Agent #2!

It started by reading the skills files, checked RecordingOverview and ReactRenders and some screenshots to establish a timeline, and read through the app sources. It did read <SendAnalyticsData> and looked for places that component was used.

However, it then got lost in the weeds of Router, useActionQueue, and use hook semantics, down to investigating whether use adds a new hook entry to the rendering fiber. It did at least start to question whether this bug was at the app level or internals level :)

Ultimately, its solution described the sequence of events fairly well, but its conclusion was to move the redirect into middleware, which isn’t correct.

This one’s a bit puzzling - maybe the extra skills gave it too much info on Next and React? :)

Total time: 16 minutes

Bug #3 Agent #3 Prompt

💡

You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.

Bug Report

The redirect itself is expected — the intention is to redirect non-admin users so they can only see their own profile. But the error flash is not expected and needs to go away.

Server logs are clean, so this is a frontend issue.

Recording

The bug has been captured in Replay recording 20da60dd-0831-491c-a7a8-ea7afe1e1af2. Use Replay MCP tools to investigate what's happening in this recording.

Source Code

The app source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't available in the recording's source maps.

Investigation Methodology

Follow this structured 8-step investigation process:

Establish user experience — Take screenshots at key moments to see exactly what the user sees. Build a visual timeline of the bug.

Map data flow — Identify all network requests, their timing, and what components consume the responses. Look for waterfalls or unexpected ordering.

Build render timeline — Use React analysis tools to see the sequence of component renders, mounts, unmounts. Look for unexpected render patterns, error boundaries catching, Suspense transitions.

Compare scenarios explicitly — The bug says "navigating to /users shows an error flash then redirects." Compare what happens during this navigation vs what happens if you directly load /users/1. What's different?

Find proximate cause, then go deeper — Once you find what immediately causes the error, ask "but WHY does this happen?" Keep going deeper. The first cause you find is rarely the root cause. Evaluate whether your fix addresses just the symptom or the actual root cause.

Systematic elimination — If the error source isn't clear from tracing, try identifying which components are on the page and reasoning about which one might be causing the issue. The "hooks" error means a component is rendering with a different number of hooks between renders — which component could be conditionally calling hooks? Look at layout components, providers, and components that appear on multiple pages.

Consider framework idioms — Is the code using Next.js/React/TanStack patterns correctly? Are there known interactions between features (Suspense, redirects, Server Actions, etc.) that could cause this?

Red-team your conclusion — Before finalizing, critically evaluate: Does your fix actually address the root cause? Could it introduce new problems? Are there other components or pages with the same pattern?

Architecture Context

The following facts about the app's architecture are known from static analysis:

Framework: Next.js 14+ with App Router

Data fetching: TanStack Query for client-side data management

Component model: Mix of React Server Components (RSC) and Client Components ('use client')

Route structure: Uses loading.tsx files for Suspense fallbacks, layout.tsx for nested layouts

Validation: Zod schemas for API response validation

Layouts: Root layout (app/layout.tsx) and a nested users layout (app/users/layout.tsx) — both contain shared components

Redirect pattern: The /users route uses Next.js redirect() to send users to /users/[id]

Components in layouts: Both the root layout and users layout include utility components like analytics trackers

Technology Skills

Before investigating, read the following skill files for relevant framework patterns. Scan the headers first, then read sections that seem relevant to the bug (hooks error + redirect + error boundary):

~/projects/temp/react-skills/skills/react-best-practices.md — React patterns and anti-patterns (large file — scan headers first, read selectively)

~/projects/temp/react-skills/skills/next-data-patterns.md — Next.js App Router data fetching patterns

~/projects/temp/react-skills/skills/next-suspense-boundaries.md — Suspense boundary and streaming rules in Next.js

~/projects/temp/react-skills/skills/tanstack-query.md — TanStack Query patterns including Suspense integration

Deliverable

Write a detailed bug report with:

Bug Description — What the user sees

Root Cause Analysis — What exactly causes the error, traced through the code

Suggested Fix — Concrete code changes to fix the root cause

Key Evidence Trail — Table of tools used and findings at each step

Bug #3 Agent #3 Solution

💡

Bug Report: "Rendered more hooks than during the previous render" on `/users` redirect

1. Bug Description

What the user sees:

When navigating to /users, the page briefly shows a loading skeleton (~400ms), then flashes a full-page error ("Application error: a client-side exception has occurred") for ~1.5 seconds, then recovers and loads /users/1 successfully.

The redirect to /users/1 is intentional behavior (non-admin users should only see their own profile). The error flash is the bug.

Timeline from recording 20da60dd-0831-491c-a7a8-ea7afe1e1af2:

Time	What's visible
661ms	Loading skeleton (Suspense fallback)
1077ms	Full-page error: "Application error: a client-side exception..."
2561ms	Loading skeleton again (redirect target loading)
4595ms	User profile page for John Doe — success

2. Root Cause Analysis

The proximate cause

The error "Rendered more hooks than during the previous render" is thrown by React's hook reconciliation system inside the Next.js Router component (app-router.tsx:207). It is caught by ErrorBoundaryHandler and displayed as a full-page error via DefaultGlobalError.

The chain of events

Step 1: Server-side redirect triggers client-side navigation

src/app/users/page.tsx is a Server Component that calls Next.js redirect():

typescript
export default async function UsersPage() {
  const user = await getUser();
  redirect(`/users/${user.id}`);
}

This throws a NEXT_REDIRECT error that's caught by RedirectErrorBoundary, which renders HandleRedirect. HandleRedirect's useEffect dispatches a router.replace() action to the Router's action queue.

Step 2: Router state becomes a thenable

The router.replace() action causes the Router's state (managed by useActionQueue) to become a thenable (a Promise-like object representing the pending navigation). Inside useActionQueue:

javascript
return isThenable(state) ? use(state) : state;

When use(state) is called on a pending thenable, it suspends the Router component. During suspension, React records the hooks executed so far:

Hook #1: useState (useActionQueue L28)

Hook #2: useTransition (useAppDevRenderingIndicator L7) — dev-only

Hook #3: useEffect (useAppDevRenderingIndicator L9) — dev-only

Total: 3 hooks recorded on the fiber.

Step 3: Thenable resolves, but isThenable() still returns true

The navigation thenable resolves externally. On the next render, the state object is the same thenable — now resolved with {status, value}. But isThenable() only checks typeof promise.then === 'function', and a resolved thenable still has a .then method. So use(state) is called again.

Step 4: use() returns synchronously — no suspension

use() on an already-resolved thenable returns the resolved value immediately without suspending. This means useActionQueue returns normally, and the Router continues executing past it.

Step 5: Next hook triggers the error

The Router's next statement after useActionQueue is useMemo at line 207 (hook #4). React's updateWorkInProgressHook compares: the previous fiber (from the suspended render) had 3 hooks, but the current render is now requesting hook #4. React throws:

"Rendered more hooks than during the previous render."

Why this is fundamentally a timing/framework issue

The bug occurs because React's Suspense mechanism should normally re-render a subtree from scratch after suspension resolves — meaning the suspended fiber's hook count shouldn't be the baseline. However, the specific interaction between:

Next.js's useActionQueue calling use() on a resolved thenable

The dev-only useAppDevRenderingIndicator adding 2 extra hooks (increasing the hook count gap from 1→2 to 3→4+)

React's hook reconciler comparing against the suspended fiber's hooks

...creates a race condition where the hook count mismatch manifests as a user-visible error.

In production, useActionQueue has only 1 hook (useState). The same mechanism could theoretically apply (1 hook → 2 hooks), but production builds may handle the Suspense fiber lifecycle differently, or the timing may prevent the race.

The app's contribution to the bug

The app uses redirect() inside a Server Component page — a documented Next.js pattern. However, this pattern pushes the redirect through the full React rendering pipeline (RSC → Error Boundary → HandleRedirect → Router state update → suspension → resolution → crash). A middleware-level redirect would bypass React entirely.

3. Suggested Fix

Primary fix: Move redirect to middleware

Replace the Server Component redirect with a Next.js middleware redirect. This avoids React rendering entirely:

Create src/middleware.ts:

typescript
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';

export function middleware(request: NextRequest) {
  // Redirect /users to /users/1 (non-admin users see only their profile)
  // In a real app, get user ID from session/auth token
  if (request.nextUrl.pathname === '/users') {
    const userId = 1; // TODO: extract from auth session
    return NextResponse.redirect(new URL(`/users/${userId}`, request.url));
  }
}

export const config = {
  matcher: '/users',
};

Simplify src/app/users/page.tsx:

typescript
// This page should never render — middleware redirects /users to /users/[id]
// Keep as a fallback in case middleware is bypassed
import { redirect } from 'next/navigation';
import { getUser } from '@/services/user';

export default async function UsersPage() {
  const user = await getUser();
  redirect(`/users/${user.id}`);
}

Alternative fix: Add a loading.tsx at the /users level

If the middleware approach isn't suitable, adding a src/app/users/loading.tsx file would provide a proper Suspense boundary that might alter the suspension/resolution timing enough to avoid the race condition. However, this is a mitigation, not a fix — it papers over the framework-level timing issue:

typescript
// src/app/users/loading.tsx
export default function Loading() {
  return <div>Redirecting...</div>;
}

Framework-level note

This appears to be a bug in Next.js's useActionQueue (or possibly in React's Suspense hook reconciliation). The useAppDevRenderingIndicator hook being called conditionally based on NODE_ENV with an eslint-disable-next-line react-hooks/rules-of-hooks comment is a red flag — it means the hook count differs between dev and production, making dev more susceptible to this class of bugs. This could be worth reporting to the Next.js team.

4. Key Evidence Trail

Step	Tool	Finding
1. Establish UX	`RecordingOverview`	2 errors: NEXT_REDIRECT (expected) + "Rendered more hooks" (bug). 4 React commits.
2. Visual timeline	`Screenshot` at 661ms, 1077ms, 2561ms, 4595ms	Loading → Error flash → Loading → Success. Error visible for ~1.5s.
3. Error source	`ReactRenders` commit analysis	Commit #1 (1.05s): ErrorBoundaryHandler caught the hooks error from Router component.
4. Console error	`ConsoleMessages` message-detail	Stack trace shows error in Router during commitLayoutEffects phase.
5. Router source	`ReadSource` + `grepika_get` on app-router.tsx	Mapped all 11+ hooks in Router. Identified `useActionQueue` at L204 as the first hook.
6. useActionQueue source	`grepika_get` on use-action-queue.js	Found `isThenable(state) ? use(state) : state` pattern. Found dev-only `useAppDevRenderingIndicator` adding 2 extra hooks.
7. use() is not a hook	`grepika_get` on react-dom-client.development.js	`useThenable` uses separate `thenableIndexCounter`, NOT the hook linked list. Confirmed `use()` doesn't create hook entries.
8. Hook count verification	`Logpoint` on useActionQueue L53	Tracked `hasThen` across 7 renders: false→false→true(suspend)→true(suspend)→true(resolved, no suspend)→...
9. Render progression	`Logpoint` on Router L207, L222, L228	Confirmed: suspended renders hit 3 hooks; post-resolution render hits hook #4 (useMemo) → error.
10. Hit count validation	`ReadSource` execution annotations	L204: 7 hits, L205: 4 hits (3 suspended), L207: 4 hits, L222: 2 hits — confirms 2 renders errored at hook #4.
11. App code review	`grepika_get` on all app source files	Confirmed redirect in Server Component, no middleware, no loading.tsx at /users level.
12. Error mechanism	`grepika_get` on react-dom updateWorkInProgressHook	Confirmed: `null === nextCurrentHook` → "Rendered more hooks than during the previous render" when fiber.alternate has fewer hooks.

Bug #3: Initial Results

After three sessions, I had mixed results for this bug. The good news was that the Replay recording was enough for Agent #2 to actually find the real no-kidding issue! That’s a huge step forward over Nadia’s results. Still, I wanted to see if there was anything else I could do to improve the analysis here.

Let’s say this was a purpose-built debugging agent, running against a known CI failure. In that case, we’d be able to provide some additional context to describe the app and it’s structure. That might be useful.

Beyond that, the two agents that didn’t get the answer spent too much time diving down into the internals of Next and React. There was useful information there, but that’s also a rabbit hole that trapped them. What if we gave directions to limit any internals investigation to “20%” effort, and focus 80% on the app itself? Would that curb the tendencies to dive too far?

Bug #3 Agent #4: App Context and 80/20 Guidance

I kicked off Agent #4 with a key update to the previous instructions:

💡

Follow this structured investigation process. Budget your time: spend ~80% on app code, component tree, effects, and state flow. Spend at most ~20% reading framework internals. If you find yourself multiple levels deep in Next.js router source or React reconciler code, stop and ask: "Do I already know enough to identify which app component is the trigger?"

I also refined the investigative process steps to encourage it to focus on actionable results, and to lead the report with a focus on the trigger.

As usual, Agent #4 started by calling RecordingOverview , saw the redirect error, checked some screenshots, used ReactRenders to check some of the render causes, and also read ReactExceptions to get more details on the error.

However, it then focused on reading the source code for the components… and immediately identified that <SendAnalyticsData> was calling a Server Action inside of a useEffect!

From there, it dug further into the dependency graph triggers for the renders. Then it examined the network requests, found the analytics POSTs, and confirmed the timing compared to the redirect error. It did start to dive into React, but actually stopped and explicitly said “I’ve been spending too much time on React internals - let me step back and focus on the app-level mechanism”. Clearly that part of the instructions worked :)

The final solution and fix were exactly right! It pointed to the Server Action in a useEffect running while Next was mid-redirect, gave the interleaved timeline of events, and specifically recommended using fetch or navigator beacons instead of a Server Action.

Total time: 10 minutes

Bug #3 Agent #4 Prompt

💡

You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.

Bug Report

The redirect itself is expected — the intention is to redirect non-admin users so they can only see their own profile. But the error flash is not expected and needs to go away.

Server logs are clean, so this is a frontend issue.

Recording

The bug has been captured in Replay recording 20da60dd-0831-491c-a7a8-ea7afe1e1af2. Use Replay MCP tools to investigate what's happening in this recording.

Source Code

The app source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't available in the recording's source maps.

Investigation Methodology

Step 1: Establish user experience

Take screenshots at key moments to see exactly what the user sees. Build a visual timeline of the bug.

Step 2: Map data flow

Identify all network requests, their timing, and what components consume the responses. Look for waterfalls or unexpected ordering.

Step 3: Build render timeline

Use React analysis tools to see the sequence of component renders, mounts, unmounts, and effects. Look for unexpected render patterns, error boundaries catching, Suspense transitions. Pay special attention to what effects fire and what state changes they trigger — effects that kick off async work (fetches, server calls, state updates) during error recovery or redirect handling are common sources of timing bugs.

Step 4: Diff from working

Compare what happens on the broken route (/users → error flash → redirect) vs a working route (directly loading /users/1). What components, effects, or actions fire on the broken path that don't fire on the working path? That delta is where the trigger lives. Use React render commits, effect traces, and network requests to build both timelines and compare them.

Step 5: Find the trigger, then explain the mechanism

Bugs often have two layers:

The trigger: What in the app's code causes this to happen? Which component, which hook, which effect, which function call?

The mechanism: Why does the framework/runtime react badly to that trigger?

Find the trigger first. Identify the specific app-level code whose presence or behavior causes the error. Then briefly explain the framework mechanism — but don't get lost in framework internals before you've identified the app-level trigger.

Step 6: Systematic elimination

If the trigger isn't clear from tracing, reason about which components are unique to the broken path. The "hooks" error means a component is rendering with a different number of hooks between renders. Look at:

Layout components that wrap the route

Providers and context components

Utility components (analytics, tracking, feature flags) that appear in layouts

Components that fire effects during mount

Which of these could be interacting with the redirect/error recovery in a way that changes the hook count?

Step 7: Consider framework idioms

Is the code using Next.js/React patterns correctly? Are there known problematic interactions between features (Suspense + redirects, Server Actions + error boundaries, effects during error recovery, etc.)?

Important: If the app code is using a documented framework pattern correctly and the error is caused by a framework-internal mechanism, say so explicitly. But still identify which specific app code triggers the framework bug and propose an app-level change that avoids the trigger.

Step 8: Red-team your fix

Before finalizing, critically evaluate:

Actionable fix test: If your fix is "move to middleware," "change the framework," or "restructure routing" — keep looking. There's probably an app-level change that removes the trigger without restructuring. Framework workarounds are plan B, not plan A.

Does your fix address the root trigger, or just avoid the code path?

Could your fix introduce new problems?

Are there other components or pages with the same pattern?

Architecture Context

The following facts about the app's architecture are known from static analysis:

Framework: Next.js 14+ with App Router

Data fetching: TanStack Query for client-side data management

Component model: Mix of React Server Components (RSC) and Client Components ('use client')

Route structure: Uses loading.tsx files for Suspense fallbacks, layout.tsx for nested layouts

Validation: Zod schemas for API response validation

Layouts: Root layout (app/layout.tsx) and a nested users layout (app/users/layout.tsx) — both contain shared components

Redirect pattern: The /users route uses Next.js redirect() to send users to /users/[id]

Components in layouts: Both the root layout and users layout include utility components like analytics trackers

Technology Skills

~/projects/temp/react-skills/skills/react-best-practices.md — React patterns and anti-patterns (large file — scan headers first, read selectively)

~/projects/temp/react-skills/skills/next-data-patterns.md — Next.js App Router data fetching patterns

~/projects/temp/react-skills/skills/next-suspense-boundaries.md — Suspense boundary and streaming rules in Next.js

~/projects/temp/react-skills/skills/tanstack-query.md — TanStack Query patterns including Suspense integration

Deliverable

Write a detailed bug report with:

Bug Description — What the user sees

Root Cause Analysis — Structure as: Trigger (app-level code) then Mechanism (framework behavior). Lead with the trigger.

Suggested Fix — Concrete code changes targeting the app-level trigger. If a framework workaround is also needed, present it as a secondary option.

Key Evidence Trail — Table of tools used and findings at each step

Bug #3 Agent #4 Solution

💡

Bug Report: "Rendered more hooks than during the previous render" on `/users` redirect

1. Bug Description

When navigating to /users, the user sees a "Something went wrong" error page flash for ~2 seconds before being redirected to /users/1. The console logs: "Rendered more hooks than during the previous render."

Visual timeline:

Time	What the user sees
895ms	Blank/loading state
1015ms	Loading state transitioning
1077ms	"Something went wrong" error flash (DefaultGlobalError)
~3000ms	Navigation to `/users/1` begins
4595ms	User profile page renders correctly

2. Root Cause Analysis

Trigger (app-level code)

The SendAnalyticsData component in src/components/SetupAnalytics.tsx calls a Server Action (sendAnalyticsData) inside a useEffect on mount. This component is rendered in two layouts:

src/app/layout.tsx:24 — <SendAnalyticsData value="root-layout" />

src/components/UserLayout.tsx:6 — <SendAnalyticsData value="users-layout" />

When /users loads, the Server Component src/app/users/page.tsx calls redirect('/users/1'), which throws a NEXT_REDIRECT error. Next.js catches this via RedirectErrorBoundary. However, the layout components above the page still mount and commit during this render — including both SendAnalyticsData instances.

Their useEffects fire immediately after commit, calling the sendAnalyticsData Server Action. These Server Action calls return RSC flight data that triggers a setState on the Next.js Router component — while the Router is already mid-redirect.

Mechanism (framework behavior)

The sequence is:

Commit #0 (1.00s): /users page renders. NEXT_REDIRECT is thrown and caught by RedirectErrorBoundary. Both SendAnalyticsData components mount in the layout tree above the error boundary.

Effects fire (1.005–1.008s): Both SendAnalyticsData useEffects execute, calling sendAnalyticsData() Server Action. This produces two POST requests (to /users and /users/1).

Server Action responses arrive (~1.05s): The responses contain RSC flight data. Next.js processes this and calls setState on the Router component to update the RSC tree.

Commit #1 (1.05s): The Router re-renders, but it is now in an inconsistent state — the redirect has partially torn down the component tree, and the Router's hook count changes between renders. React throws "Rendered more hooks than during the previous render".

ErrorBoundaryHandler catches this, rendering DefaultGlobalError — the error flash the user sees.

Commit #2 (3.05s): The redirect to /users/1 eventually completes, the full tree re-renders, and the app recovers.

In short: Calling a Server Action from a useEffect in a layout that wraps a redirecting page causes an RSC state update that corrupts the Router during redirect processing.

3. Suggested Fix

Primary fix — don't call Server Actions from layout effects during redirect-sensitive renders.

The simplest fix is to replace the Server Action call with a plain fetch() (or navigator.sendBeacon()) for analytics. This avoids the RSC flight data response that corrupts the Router state:

typescript
// src/components/SetupAnalytics.tsx
'use client';

import { useEffect } from 'react';

export function SendAnalyticsData({ value }: { value: string }) {
  useEffect(() => {
    // Use fetch() or sendBeacon() instead of a Server Action.
    // Server Actions return RSC flight data that can interfere
    // with the Next.js Router during redirect processing.
    fetch('/api/analytics', {
      method: 'POST',
      body: JSON.stringify({ name: value }),
      keepalive: true, // ensures delivery even on navigation
    });
  }, [value]);

  return <></>;
}

With a corresponding API route:

typescript
// src/app/api/analytics/route.ts
import { NextResponse } from 'next/server';

export async function POST(request: Request) {
  const { name } = await request.json();
  console.log('analytics data sent', name);
  return NextResponse.json({ ok: true });
}

Why this works: A plain fetch() response is just JSON — it doesn't carry RSC flight data and doesn't trigger setState on the Router. The analytics fires the same way, but the response can't corrupt React's component tree.

Alternative fix — if you want to keep using a Server Action, you can suppress the return value by wrapping it so it doesn't trigger Router updates:

typescript
// src/components/SetupAnalytics.tsx
'use client';

import { useEffect, useTransition } from 'react';
import { sendAnalyticsData } from '@/actions/analytics';

export function SendAnalyticsData({ value }: { value: string }) {
  const [, startTransition] = useTransition();

  useEffect(() => {
    startTransition(async () => {
      await sendAnalyticsData(value);
    });
  }, [value, startTransition]);

  return <></>;
}

However, this still goes through the Server Action machinery and may still carry risk in edge cases. The fetch()/sendBeacon() approach is more robust.

4. Key Evidence Trail

Step	Tool	Finding
Screenshot timeline	`replay_Screenshot` at 895ms, 1015ms, 1077ms, 4595ms	Error flash visible at 1077ms, resolves by 4595ms
Recording overview	`replay_RecordingOverview`	1 console error at 1050ms: "Rendered more hooks". 2 error boundary catches: NEXT_REDIRECT at 1005ms, hooks error at 1050ms
React render commits	`replay_ReactRenders` (commits mode)	Commit #0: redirect caught + layouts mount. Commit #1: Router throws hooks error 50ms later
Trigger-detail for commit #1	`replay_ReactRenders` (trigger-detail)	Dependency chain: effect → setState → Router re-render → crash
Network requests at error time	`replay_NetworkRequest` (detail)	Two POST Server Action calls at 1008-1010ms with `next-action` headers. Bodies: `["root-layout"]`, `["users-layout"]`
Effect execution timing	`replay_Logpoint` on SetupAnalytics.tsx:9	5 hits total. Hits at 972ms ("root-layout") and 1005ms ("users-layout") fire during redirect processing window
Dependency chain analysis	`replay_DescribePoint` with `dependencyChain: true`	Confirmed: effect from commit #0 → Server Action call → setState on Router → hooks error
Source code review	File reads of layout.tsx, UserLayout.tsx, SetupAnalytics.tsx, analytics.ts	SendAnalyticsData in both root and users layouts calls Server Action in useEffect

5. Why Other Components Aren't the Problem

The SendAnalyticsData in UserPage.tsx:221 is not part of this bug — UserPage only renders after the redirect completes (commit #3 at 3.89s). The problem is specifically the two instances in layout components that persist across the redirect boundary — they mount during commit #0 when the redirect is being processed, and their effects fire Server Actions whose responses corrupt the Router.

Bug #3 Conclusions

This one was fascinating! Let’s see how the results looked using that updated comparison rubric:

	Run 1 (baseline)	Run 2 (+methodology)	Run 3 (+ctx+skills)	Run 4 (best CI)
Did it fix the problem?	❌	✅	❌	✅
Did it identify the root cause correctly?	❌	✅	❌	✅
Did it investigate beyond the first fix?	⚠️ Wrong direction	✅	⚠️ Wrong direction	✅
Did it check the rest of the codebase?	❌	✅	Partial	✅
Did it produce the correct architectural fix?	❌	✅	❌	✅

With Bug #2, we saw a pretty consistent stairstep of improvements as we added more context and skills. Here, the agent either nailed everything or got lost in the weeds.

Agent #3 is still the one that confuses me the most. It had the same process instructions as #2, but with added app context and React skills. Somehow that led it off into the guts of React and it never actually figured out the app-level issue.

I was very pleased to see that the instructions to focus on app concerns had the desired results! In fact it was hilarious to read the LLM actually saying “oops I’m going too deep, back up!”

Overall Results

Nadia’s LLM runs got partial credit on Bug #2 (the double loading skeleton bug), and failed completely on Bug #3 (the broken redirect + Server Actions call).

By adding Replay recordings and improving the prompts with better investigative instructions, I saw my agents go from a partial fix of Bug #2 to nailing the exact cause and solution, and two of the four runs for Bug #3 also come up with the exact cause and solution as well!

This was very exciting! Of course I hoped that adding Replay would produce better results - that’s why I tried doing this in the first place. But I had no way of knowing for sure without actually running the experiment and seeing what happened.

Replay MCP’s Tools Were Critical

In all these explorations, the agents heavily used Replay MCP to understand both the app runtime behavior and Next + React’s internals. They consistently oriented themselves with RecordingOverview to surface app details, errors, and renders; used Screenshot to visualize what the app looked like at various points; and dug into behavior details with ReactRenders and NetworkRequests.

For the deep-dive exploration, the agents used SearchSources to find relevant files and ReadSource to view sources with hit counts, Then they dug deeper with numerous Logpoint and Evaluate calls to dynamically introspect behavior at key lines and hits.

As we saw from Nadia’s post and a couple of my example agent runs, just having the source code available wasn’t enough to solve these issues! Reading the app source was important, and it helped the agent understand the structure of the app codebase and find things like the Server Action. Both parts were necessary. But without the runtime data from the Replay recording, the agents could never have understood the full picture of the apps behavior, and wouldn’t have been able to nail the results.

Prompting and Context Still Matter

As I talked about in the previous post on Dan’s useDeferredValue bug: I really was surprised just how much the prompt and instructions mattered here, and how much of a difference they made!

I’m sure there’s plenty of academic studies out there that talk about all these aspects in more scientific detail. I’m just one dev who tried running some experiments locally :) But this really has impressed upon me the importance of providing good context and instructions for any autonomous agent investigative work. (And, for that matter, helps explain why my own personal hands-on agentic development workflow seems to produce good results for me.)

Replay MCP: The Time-Travel Superpower Your Agents Need

As I said in the last post: I joined Replay because I saw the potential and promise of time-travel debugging and how it can make the debugging process drastically faster and better. It enables solving bugs that are otherwise impossible, and provides insights into real runtime behavior that no other monitoring or analysis tool can provide.

Replay DevTools gave humans the ability to investigate and solve bugs with time travel.

Now, Replay MCP gives your agents the same time-travel superpowers.

And, even better: we’ve updated our Replay Github bot to automatically trigger time-travel analysis on failed E2E tests, and provide root-cause and a recommended fix as a comment on your PR! This will help your team fix flaky tests, ensure PR code quality, and improve shipping velocity. One step closer to self-healing CI/CD!

Try out Replay MCP and our E2E Test Suites integrations today, and stay tuned - we’ve got a lot more awesome time-travel debugging capabilities coming soon!

Replay Time Travelogue: Improving Nadia’s “Debugging with AI” Results Using Replay MCP

Background: Nadia’s “Debugging with AI” Example

Agentic Time-Travel Debugging with Replay MCP

Investigation Process and Setup

Bug #1: Zod Schema Error in a User Profile

Task: Debug a Bug in a Replay Recording

Your Mission

Approach

Deliverable

Bug Report

1. Bug Description

2. Root Cause

3. Suggested Fix

4. Key Evidence Trail

Bug #2: User Page Double Loading Skeleton

Bug #2 Agent #1: Basic Prompt

Task: Debug a Bug in a Replay Recording

Your Mission

Tools Available

Deliverable

Bug Report: Double Loading Skeleton on Client-Side Navigation to User Page

1. Bug Description

2. Root Cause

3. Suggested Fix

4. Key Evidence Trail

Bug #2 Agent #2: Investigative Process

Task: Debug a Bug in a Replay Recording

Your Mission

Tools Available

Investigation Process

Deliverable

Bug Report: Double Loading Skeleton on User Page Navigation

1. Bug Description

2. Root Cause Analysis

3. Suggested Fix

4. Key Evidence Trail

5. Remaining Concerns

Bug #2 Agent #3: App Architecture Summary

Task: Debug a Bug in a Replay Recording

Your Mission

Tools Available

Application Context

Investigation Process

Deliverable

Bug Report: Double Loading Skeleton on Navigation to User Page

1. Bug Description

2. Root Cause Analysis

3. Suggested Fix

Primary fix: Switch UserPage to use useUserSuspense()

Same pattern applies to other pages

4. Key Evidence Trail

5. Remaining Concerns

Bug #2: Initial Results

Bug #2 Agent #4: Skill Files

Task: Debug a Bug in a Replay Recording

Your Mission

Tools Available

CI Context

Application Context

Technology Skills

Investigation Process

Deliverable

1. Bug Description

2. Root Cause Analysis

Loading mechanism #1: Next.js route-level loading.tsx

Loading mechanism #2: Component-level isLoading check

Why navigation shows two skeletons:

Why refresh shows only one skeleton:

The architectural mismatch:

3. Suggested Fix

Immediate fix: Switch to useSuspenseQuery

Deeper fix: Eliminate the waterfall with server-side prefetching

4. Key Evidence Trail

5. Remaining Concerns

Bug #2 Conclusions

Bug #3: Next.js Redirect Error

Bug #3 Agent #0: No Recording

Bug #3 Agent #1: Basic Instructions

Bug Report

Recording

Primary fix: Switch `UserPage` to use `useUserSuspense()`

Loading mechanism #1: Next.js route-level `loading.tsx`

Loading mechanism #2: Component-level `isLoading` check

Immediate fix: Switch to `useSuspenseQuery`

Bug Report: "Rendered more hooks than during the previous render" on `/users` page

Bug Report: "Something went wrong" flash on `/users` redirect

Bug Report: "Rendered more hooks than during the previous render" on `/users` redirect

Bug Report: "Rendered more hooks than during the previous render" on `/users` redirect